47 research outputs found

    Scalable Graph Building from Text Data

    Get PDF
    International audienceIn this paper we propose NNCTPH, a new MapReduce algorithm that is able to build an approximate k-NN graph from large text datasets. The algorithm uses a modified version of Context Triggered Piecewise Hashing to bin the input data into buckets, and uses an exhaustive search inside the buckets to build the graph. It also uses multiple stages to join the different unconnected subgraphs. We experimentally test the algorithm on different datasets consisting of the subject of spam emails. Although the algorithm is still at an early development stage, it already proves to be four times faster than a MapReduce implementation of NN-Descent, for the same quality of produced graph

    Determining the k in k-means with MapReduce

    Get PDF
    International audienceIn this paper we propose a MapReduce implementation of G-means, a variant of k-means that is able to automatically determine k, the number of clusters. We show that our implementation scales to very large datasets and very large values of k, as the computation cost is proportional to nk. Other techniques that run a clustering algorithm with different values of k and choose the value of k that provides the " best " results have a computation cost that is proportional to nk 2. We run experiments that confirm that the processing time is proportional to k. These experiments also show that, because G-means adds new centers progressively, if and where they are needed, it reduces the probability to fall into a local minimum, and finally finds better centers than classical k-means processing

    Developments on an IEEE 802.15.4-based wireless sensor network, Journal of Telekommunications and Information Technology, 2008, nr 2

    Get PDF
    In this paper a summary is given of the ongoing research at the Belgian Royal Military Academy in the field of mobile ad hoc networks in general and wireless sensor networks (WSNs) in particular. In this study, all wireless sensor networks are based on the physical and the medium access layer of the IEEE 802.15.4 low rate wireless personal area networks standard. The paper gives a short overview of the IEEE 802.15.4 standard in the beaconless mode together with a description of the sensor nodes and the software used throughout this work. The paper also reports on the development of a packet sniffer for IEEE 802.15.4 integrated in wireshark. This packet sniffer turns out to be indispensable for debugging purposes. In view of future applications on the wireless network, we made a theoretical study of the effective data capacity and compared this with measurements performed on a real sensor network. The differences between measurements and theory are explained. In case of geograph- ically meaningful sensor data, it is important to have a knowledge of the relative position of each node. In the last part of the paper we present some experimental results of positioning based on the received signal strength indicators (RSSI). As one could expect, the accuracy of such a method is poor, even in a well controlled environment. But the method has some potential

    Breakpoint characterization of large deletions in EXT1 or EXT2 in 10 Multiple Osteochondromas families

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Osteochondromas (cartilage-capped bone tumors) are by far the most commonly treated of all primary benign bone tumors (50%). In 15% of cases, these tumors occur in the context of a hereditary syndrome called multiple osteochondromas (MO), an autosomal dominant skeletal disorder characterized by the formation of multiple cartilage-capped bone tumors at children's metaphyses. MO is caused by various mutations in <it>EXT1 </it>or <it>EXT2</it>, whereby large genomic deletions (single-or multi-exonic) are responsible for up to 8% of MO-cases.</p> <p>Methods</p> <p>Here we report on the first molecular characterization of ten large <it>EXT1</it>- and <it>EXT2</it>-deletions in MO-patients. Deletions were initially indentified using MLPA or FISH analysis and were subsequently characterized using an MO-specific tiling path array, allele-specific PCR-amplification and sequencing analysis.</p> <p>Results</p> <p>Within the set of ten large deletions, the deleted regions ranged from 2.7 to 260 kb. One <it>EXT2 </it>exon 8 deletion was found to be recurrent. All breakpoints were located outside the coding exons of <it>EXT1 </it>and <it>EXT2</it>. Non-allelic homologous recombination (NAHR) mediated by <it>Alu</it>-sequences, microhomology mediated replication dependent recombination (MMRDR) and non-homologous end-joining (NHEJ) were hypothesized as the causal mechanisms in different deletions.</p> <p>Conclusions</p> <p>Molecular characterization of <it>EXT1</it>- and <it>EXT2</it>-deletion breakpoints in MO-patients indicates that NAHR between <it>Alu-</it>sequences as well as NHEJ are causal and that the majority of these deletions are nonrecurring. These observations emphasize once more the huge genetic variability which is characteristic for MO. To our knowledge, this is the first study characterizing large genomic deletions in <it>EXT1 </it>and <it>EXT2</it>.</p

    The Magnitude of Global Marine Species Diversity

    Get PDF
    Background: The question of how many marine species exist is important because it provides a metric for how much we do and do not know about life in the oceans. We have compiled the first register of the marine species of the world and used this baseline to estimate how many more species, partitioned among all major eukaryotic groups, may be discovered. Results: There are ∼226,000 eukaryotic marine species described. More species were described in the past decade (∼20,000) than in any previous one. The number of authors describing new species has been increasing at a faster rate than the number of new species described in the past six decades. We report that there are ∼170,000 synonyms, that 58,000–72,000 species are collected but not yet described, and that 482,000–741,000 more species have yet to be sampled. Molecular methods may add tens of thousands of cryptic species. Thus, there may be 0.7–1.0 million marine species. Past rates of description of new species indicate there may be 0.5 ± 0.2 million marine species. On average 37% (median 31%) of species in over 100 recent field studies around the world might be new to science. Conclusions: Currently, between one-third and two-thirds of marine species may be undescribed, and previous estimates of there being well over one million marine species appear highly unlikely. More species than ever before are being described annually by an increasing number of authors. If the current trend continues, most species will be discovered this century

    The Science Performance of JWST as Characterized in Commissioning

    Full text link
    This paper characterizes the actual science performance of the James Webb Space Telescope (JWST), as determined from the six month commissioning period. We summarize the performance of the spacecraft, telescope, science instruments, and ground system, with an emphasis on differences from pre-launch expectations. Commissioning has made clear that JWST is fully capable of achieving the discoveries for which it was built. Moreover, almost across the board, the science performance of JWST is better than expected; in most cases, JWST will go deeper faster than expected. The telescope and instrument suite have demonstrated the sensitivity, stability, image quality, and spectral range that are necessary to transform our understanding of the cosmos through observations spanning from near-earth asteroids to the most distant galaxies.Comment: 5th version as accepted to PASP; 31 pages, 18 figures; https://iopscience.iop.org/article/10.1088/1538-3873/acb29

    The James Webb Space Telescope Mission

    Full text link
    Twenty-six years ago a small committee report, building on earlier studies, expounded a compelling and poetic vision for the future of astronomy, calling for an infrared-optimized space telescope with an aperture of at least 4m4m. With the support of their governments in the US, Europe, and Canada, 20,000 people realized that vision as the 6.5m6.5m James Webb Space Telescope. A generation of astronomers will celebrate their accomplishments for the life of the mission, potentially as long as 20 years, and beyond. This report and the scientific discoveries that follow are extended thank-you notes to the 20,000 team members. The telescope is working perfectly, with much better image quality than expected. In this and accompanying papers, we give a brief history, describe the observatory, outline its objectives and current observing program, and discuss the inventions and people who made it possible. We cite detailed reports on the design and the measured performance on orbit.Comment: Accepted by PASP for the special issue on The James Webb Space Telescope Overview, 29 pages, 4 figure

    Multi-agent system for APT detection

    No full text
    corecore